Exploitative and Exploratory Attention in a Four-Armed Bandit Task
نویسندگان
چکیده
When making decisions, we are often forced to choose between something safe we have chosen before, and something unknown to us that is inherently risky, but may provide a better long-term outcome. This problem is known as the Exploitation-Exploration (EE) Trade-Off. Most previous studies on the EE Trade-Off have relied on response data, leading to some ambiguity over whether uncertainty leads to true exploratory behavior, or whether the pattern of responding simply reflects a simpler ratio choice rule (such as the Generalized Matching Law (Baum, 1974; Herrnstein, 1961)). Here, we argue that the study of this issue can be enriched by measuring changes in attention (via eye-gaze), with the potential to disambiguate these two accounts. We find that when moving from certainty into uncertainty, the overall level of attention to stimuli in the task increases; a finding we argue is outside of the scope of ratio choice rules.
منابع مشابه
Bayesian and Approximate Bayesian Modeling of Human Sequential Decision-Making on the Multi-Armed Bandit Problem
In this paper we investigate human exploration/exploitation behavior in a sequential-decision making task. Previous studies have suggested that people are suboptimal at scheduling exploration, and heuristic decision strategies are better predictors of human choices than the optimal model. By incorporating more realistic assumptions about subject’s knowledge and limitations into models of belief...
متن کاملBayesian and Approximate Bayesian Modeling of Human Sequential Decision-Making on the Multi-Armed Bandit Problem
In this paper we investigate human exploration/exploitation behavior in sequential-decision making tasks. Previous studies have suggested that people are suboptimal at scheduling exploration, and heuristic decision strategies are better predictors of human choices than the optimal model. By incorporating more realistic assumptions about subject’s knowledge and limitations into models of belief ...
متن کاملTaming Non-stationary Bandits: A Bayesian Approach
We consider the multi armed bandit problem in non-stationary environments. Based on the Bayesian method, we propose a variant of Thompson Sampling which can be used in both rested and restless bandit scenarios. Applying discounting to the parameters of prior distribution, we describe a way to systematically reduce the effect of past observations. Further, we derive the exact expression for the ...
متن کاملBridging Computational Neuroscience and Machine Learning on Non-Stationary Multi-Armed Bandits
Fast adaptation to changes in the environment requires both natural and artificial agents to be able to dynamically tune an exploration-exploitation trade-off during learning. This trade-off usually determines a fixed proportion of exploitative choices (i.e. choice of the action that subjectively appears as best at a given moment) relative to exploratory choices (i.e. testing other actions that...
متن کاملMulti-armed Bandit Formulation of the Task Partitioning Problem in Swarm Robotics
Task partitioning is a way of organizing work consisting in the decomposition of a task into smaller sub-tasks that can be tackled separately. Task partitioning can be beneficial in terms of reduction of physical interference, increase of efficiency, higher parallelism, and exploitation of specialization. However, task partitioning also entails costs in terms of coordination efforts and overhea...
متن کامل